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ABSTRACT 

Linear regression is common in astronomical analyses. I discuss a Bayesian hierarchical mod¬ 
eling of data with heteroscedastic and possibly correlated measurement errors and intrinsic 
scatter. The method fully accounts for time evolution. The slope, the normalization, and the 
intrinsic scatter of the relation can evolve with the redshift. The intrinsic distribution of the 
independent variable is approximated using a mixture of Gaussian distributions whose means 
and standard deviations depend on time. The method can address scatter in the measured 
independent variable (a kind of Eddington bias), selection effects in the response variable 
(Malmquist bias), and departure from linearity in form of a knee. I tested the method with 
toy models and simulations and quantified the effect of biases and inefficient modeling. The 
R-package LIRA (Linear Regression in Astronomy) is made available to perform the regres¬ 
sion. 
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1 INTRODUCTION 


Astronomy and statistics have an interwoven history ([Feig elson &| 
|Babu|20T^ . Linear regression is one of the most frequently used 
statistical techniques in astronomical data analysis. There is an im¬ 
pressive variety of methods to estimate functional relationships be¬ 
tween variables. 

Linear regression is kind of easy. We can draw a line which 
nicely interpolates a distribution of points on a paper by eye. Con¬ 
necting dots and forming a regular pattern is a game for kids. Diffi¬ 
culties lie in refining the results and uncovering the quantity we are 
really looking for. As an example, the ordinary least square estima¬ 
tor is elegant and powerful. Still, results may be meaningless if we 
apply it out of its range of validity. 

Most astronomical data analyses feature intrinsic scatter about 
the regression line. Measurement errors can affect both the inde¬ 
pendent and the dependent variables. Errors may be heteroscedas¬ 
tic, i.e., they differ, and possibly correlated. The intrinsic distribu¬ 
tion of the independent variables may be irregular or not uniform. 
The independent variable may be hidden and we could measure just 
a proxy of it. Selection effects can make the observed sample not 
representative of the population we want to study. 

These aspects influence regression results and can make the 
use of some statistical estimators inappropriate. Many methods 
have been proposed to tackle these effects lAkritas & Bershadyl 


^ |Kelly||2507l [Isobe et al.||1990[ [Hogg, Bovy & Lang||2010[ 


Feigelson & Babu||2012 and references therein). Here, we are 

mostly interested in methods assuming that the dispersions, either 
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the intrinsic scatter or the uncertainties in the measurement process, 
are Gaussian. Non-Gaussian multivariate datasets need generalized 
linear methods ( |de Souza et al.|20T5| l. 

Some statistical papers had the great merit to clarify the in¬ 
volved problematics to the astronomical community. [Akritas ^ 
|Bershady| ^1996| l proposed the BCES estimator (Bivariate Corre¬ 
lated Errors and intrinsic Scatter) which accommodates intrinsic 
scatter in addition to correlated, heteroscedastic measurement er¬ 
rors on both variables by correcting the observed moments of the 
data. 

|Kelly| ( [2007) l described a Bayesian method (MLINMIX) based 
on the likelihood function of the measured data. The method can 
account for measurement errors, intrinsic scatter, multiple indepen¬ 
dent variables, non-detections, and selection effects in the indepen¬ 
dent variable. |Kelly | j2007^ emphasized that the underlying distri¬ 
bution of covariates in a regression has to be modeled to get un¬ 
biased regression parameters and he proposed to approximate the 
intrinsic distribution of the independent variables as a mixture of 
Gaussian functions. This modeling is flexible when estimating the 
distribution of the true values of the independent variable and it is 
robust against model mispecification. 

Recently, |Mantz I pO 15 | l extended the MLINMIX algorithm to 
the case of multiple response variables and he described how to 
model the prior distribution of covariates using a Dirichlet process 
rather than a mixture. Alternative approaches based on generative 
models for the data were proposed too ( |Hogg, Bovy & Lang|2010[ 
[Robotham & Obreschkow|2015t . 

Here, we build upon these methods to develop a linear regres¬ 
sion tool optimized to the study of time evolving scaling relations. 
Some remarkable features show up in astronomical studies. As dis- 
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2 M. Sereno 


cussed above, astronomical data sets can be affected by heterosce- 
datic, correlated errors in both variables and by intrinsic scatter 
around the regression line. 

Furthermore, the scaling parameters of the studied phe¬ 
nomenological relationship may not be constant with time. The 
source of intrinsic scatter is the variation of the physical proper¬ 
ties which can be time dependent. The slope of the relation may be 
time dependent too if some physical processes are more conspicu¬ 
ous either early or lately. Time has then a special role and cannot be 
treated as a simple independent variable in a multivariate analysis. 

In astronomical analyses, we are often interested in the corre¬ 
lation among an observable quantity against a variable which we 
do not have access to, e.g., the mass of a black hole, the mass of a 
galaxy cluster, the star formation rate of a galaxy. We cannot really 
measure these quantities but just scattered proxies of them, e.g., the 
weak lensing mass of a galaxy in place of the true mass. 

Differently from gravity, a lot of astrophysical phenomena are 
scale dependent. Some baryonic processes may be triggered above 
some thresholds and be ineffective below. This can break linearity. 

Selection effects and heterogeneity can make the astronomical 
sample used in the regression not representative of the population 
we are interested in. The sample may be sparse or selected accord¬ 
ing to the value of the response variable, as in a flux limited survey. 

I discuss a hierarchical Bayesian method to deal with the 
above aspects. Its main assumption is that scatters and uncertainties 
are Gaussian. Part of it has been already presented and employed 
in the CoMaLit (Comparing MAsses in Literature) series of papers 
(|Sereno & Ettori|2015bl CoMaLit-I, [Sereno, Ettori & Moscardini| 
[20T5l CoMaLit-IL fSereno & Ettori|2015a| CoMaLit-IV). 

The method shares important features with other recently de¬ 
veloped methods. |Maughan| ( |2014^ proposed a model to constrain 
simultaneously the form and evolution of the scaling relations. The 
method distinguish between measured values, intrinsic scattered 
values, and model values and can constrain the intrinsic scatter and 
its covariance. Correlation among intrinsic scatters has to be con¬ 
sidered in multivariate analyses to obtain unbiased scaling relations 
dEvrard et al.|2014[[RoTO et al.|2014||Mantz et al.|2010||2015| ). 

The paper is as follows. Ins Sec.|^ I introduce power-law scal¬ 
ing relations and their linear counterparts in logarithmic space. The 
hierarchical Bayesian model is presented in Sec. Section]^ is 
devoted to simulations and algorithm testing. Final considerations 
are in Sec.|^ Appendix [A| gives some information on the package 
accompanying the paper. Appendix [B] gives some hints about the 
modeling of the time evolution. 

If needed, I adopt the same conventions and notations of 
the CoMaLit series. The frame-work cosmological model is the 
concordance flat ACDM universe with matter density parameter 
Dm = 0.3; H{z) is the redshift dependent Hubble parameter and 
Ez = H{z)/Ho. Tog’ is the logarithm to base 10 and Tn’ is the 
natural logarithm. 

The method described in the present paper has been imple¬ 
mented in the R languag^ The package is named LIRA (Linear 
Regression in Astronomy) and it is publicly available from CRAN 
(Comprehensive R Archive Network) or GitHub, see App.jA) 


2 LINEAR SCALING 

Most of the scaling relations we deal with in astronomy are time 
evolving power-laws. This simple schematism is supported by ob¬ 
servations, theoretical considerations, and numerical simulations 
( [Stanek et al.|2010]|Giodini et al.|20l3t . The general form of the re¬ 
lation between two properties, e.g., the observable O and the mass 
M, is 

O oc (1) 

where /3 is the slope and the redshift evolution in the median scaling 
relation is accounted for by the factor . According to the context, 
the redshift factor may be either E;^ or the factor (1 -F 2 ). In 
logarithmic variables, the scaling relation is linear and the scatter is 
Gaussian, 

log O = a -F /3 log M -F 7 log . (2) 

In the following, T = logE^. If spectroscopically determined, 
measurement uncertainties in redshift are negligibl^ The relative 
uncertainty in photometric redshifts can be small too, and usually 
smaller than uncertainties in other measurable properties, such as 
mass, luminosity, or temperature. I will not consider redshift uncer¬ 
tainties in the following. 

In the usual framework, the time evolution does not depend 
on the mass scale and only affects the normalization. This is sup¬ 
ported by the self-similar scenario ( [Giodini et al.|[20T^ , where 
the factor F^ for observable properties measured within the same 
over-density radius is Ez. However, the interplay between differ¬ 
ent physical processes that can be more or less effective at different 
times and can make the slope time dependent, j3{z). Assuming that 
the evolution of the slope with redshift is linear in T, Eq. can be 
generalized as 

Y = a + l3X + '^T + 5XT. (3) 

where X = log M, and Y = log O. X and Y are random latent 
variables with intrinsic scatter. The time variable T is determinis¬ 
tic, not affected by measurement errors (which I neglect). Latent 
and observed variables are related through a structural equation 
model ( |Kelly|20071 and references therein). In statisticians’ terms, 
the critical criterion is linearity in the model parameters, not in the 
model variables, which makes Eq. a linear model. 


3 REGRESSION SCHEME 

The Bayesian regression model presented in the following is a mea¬ 
surement error model jEeigelson & Babu|2012| >. Measurement er¬ 
rors are involved in the hierarchical structure and incorporated into 
the model. I assume that all scatter terms, i.e., intrinsic scatter and 
measurement errors, are Gaussian with zero mean although with 
different variances. 

Linear regression in astronomy is usually characterized by in¬ 
trinsic scatter around the scaling relation and measurement errors 
in both the independent and the dependent variables. I assume that 
the covariate variable Xz and the response variable Yz, which are 
latent, fall exactly on a straight line. This is the underlying rela¬ 
tion we want to discover. The latent variables cannot be measured. 
We can measure their proxies X and Y, which differ from Xz and 
Yz for the intrinsic scatters. These are intrinsic deviations of data 


^ http://www.r-project.org 


^ I am not considering catastrophic errors. 
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Table 1. Parameters of the regression scheme and their description. The variables Z is the covariate, X is a proxy of Z, and Y is the response variable. 
2 = is the user defined reference redshift. D is either the luminosity or angular diameter distance. Fz and D are normalized such that i^z (-Zref) = 1 
and D{zrQf) = 1. LIRA is highly customizable and priors can be set by the user among the distributions defined in JAGS. In addition, LIRA defines the 
prec. dgamma prior too. Priors in square brackets can be set only as delta distributions. See Sec.|3.7|and the LIRA user manual for details. 


Type Meaning 


Symbol 

Code symbol 

Default prior 


Y-Z scaling 




Yz — OlY\z + 0Y\zZ + ''1y\zT + Y\z'Z T 

Conditional scaling relation intercept 



alpha.YIZ 

dunif 

slope 


Py\x 

beta.YIZ 

dt 

time evolution 


1y\z 

gamma.YIZ 

dt 

time tilt 


Y\z 

delta.YIZ 

0 

Yz — OlY\ZMee + /^ylz.knce-^ + 'lY\Z,ka<xT + ^ 

Scaling relation before the break slope for Z < Z^nee 


/ly|X,laicc 

beta.YIZ . knee 

beta.YIZ 

time tilt for Z < Z^nee 


^y|Z,knec 

delta . YIZ . knee 

delta.YIZ 

/knee(-Z) = 1/(1 + exp[(Z - Zta,ee)/Wc]) 

Transition function break scale 


Z(aiee 

Z , knee 

dunif 

break length 

X-Z scaling 

^knee 

1 . knee 

le-04 

Xz — otx\z + Px\zZ + 7x|z2" + &x\zZ T 

Proxy of the independent variable bias 


“X|Z 

alpha.XIZ 

0 

slope 


/lx|Z 

beta.XIZ 

1 

time evolution 


7x|z 

gamma.XIZ 

0 

time tilt 


^XIZ 

delta.XIZ 

0 


Scatters 

^Y\Z = Wy\Z,0 + /knee(-^)(crr|Z,0,knee “ ^Y\Z,o)]^z ]J^^Y\Z,D 


Intrinsic scatter 

scatter at 2: = Zref for Z ^ Z^nee 

^Y\Z ,0 

sigma.YIZ .0 

prec.dgamma 


scatter at 2: = Zref for Z < Z^j^ee 

' 7 y|Z, 0 ,knce 

sigma.YIZO.knee 

sigma.YIZ .0 


time evolution with Fz 

'y^Y\Z,Fz 

gamma.sigma.YIZ.F z 

0 


time evolution with D 

To“V"|Z,£> 

gamma.sigma.YIZ.D 

0 

O'< 7 ^ 1 Z 

'^X\Z = ^X\ZfiF'z 

£)'y‘^x\z,n 




Intrinsic scatter of the proxy scatter a\. z = z^^f 

'^XIZ.O 

sigma.XIZ .0 

0 


time evolution with Fz 

'y^X\Z,Fz 

gamma.sigma.XIZ.Fz 

0 


time evolution with D 

'y^X\Z,Fz 

gamma.sigma.XIZ.D 

0 

'ypx 

PXY\Z = PXY\Z, 0 ^z 

Y\Z,Fz £ppXY\Z,D 




Intrinsic scatter correlation 

correlation at 2: = 2:ref 

PXY\Z ,0 

rho.XYIZ.O 

0 


time evolution with Fz 

'^PXY\Z,Fz 

gamma.rho.XYIZ.Fz 

0 


time evolution with D 

'yPXY\Z,Fz 

gamma.rho.XYIZ.D 

0 


Intrinsic distribution of the independent variable 



p{z) OC [Xlfc -^k J^{pz,k 





Gaussian mixture 

number of components 

^mix 

n.mixture 

[1] 


weights of the components 


pi [k] 

ddirch 


minimum Z value (only for = 1 ) 

7 . 

■^min 

Z. min 

—oo 


maximum Z value (only for n^ix = 1 ) 

Zmax 

Z. max 

+ CXD 

Pz,k{z) = Pz,0k + 7 /Jz, 

fY + lk.z,D'^°sD 




Means of the Gaussian 

mean of the first component at 2 = 2ref 

Pz,01 

mu.Z.0 

dunif 

components 

means of the additional components 
{ 2 ^ k ^ rinjix) at z = 

PZ, 0 k 

mu.Z.0.mixture[k] 

dunif 


time evolution with Fz 

'yp-zFz 

gamma.mu.Z.F z 

dt 


time evolution with D 


gamma.mu.Z.D 

dt 

/ \ 7 > R' 

(^Z,k(z) = (Lz^OkFz 

2 £) 7 <T 2 ,D 




Standard deviations of the 

deviation of the first component at 2 = 2ref 


sigma.Z.0 

prec.dgamma 

Gaussian components 

deviations of the additional components 

(2 ^ /l ^ ^mix) at 2 = 2ref 

<^Z, 0 k 

sigma.Z.0.mixture[k] 

prec.dgamma 


time evolution with Fz 

T'.tz.F, 

gamma.sigma.Z.F z 

0 


time evolution with D 

7 ct^,D 

gamma.sigma.Z.D 

0 
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points from the intrinsic scaling relation that are present even if all 
measurements were made with perfect precision and accuracy. 

The proxies X and Y are linked to the observed manifest vari¬ 
ables X and y with additional error terms. We could measure X and 
Y only in an ideal experiment with infinite accuracy and precision. 

The measured values of x and y and their known measurement 
errors are the inputs to the model. The variables X,Y, Xz and Yz 
have to be determined in the regression procedure. The regression 
scheme is summarized in Table [T] and described in details in the 
following. 

3.1 Linear scaling 

The linear relation between two unscattered quantities (I am not 
counting the time) can be expressed as 

Yz = Q.y\z + Py\zZ +'yY\zT + Sy\zZ (4) 

where a denotes the normalization, the slope /? accounts for the 
dependence with Z, the slope 7 accounts for the time-evolution of 
the normalization and S quantifies the tilt of the slope with time. 

The basic case summarized in Eq. is enough to describe the 
regression of Yz against a variable which is directly observable. 
This is the case of the luminosity versus temperature relation of 
galaxy clusters. In some other cases, the independent variable Z is 
not directly available from observations. For example, we cannot 
measure the mass of a cluster (Z), but we can approximate it with 
the weak lensing mass (W). We have then to couple Eq. 0 with 

Xz = ax\z + Px\zZ -I- 7x|zr + &x\zZ T, (5) 

In this case, Xz and Yz are related to the same covariate variable, 
Z. The relations among Z,Xz and Yz are deterministic and they 
are not affected by scatter. Xz and Yz are rescaled versions of the 
latent variable Z, which can be seen as a fundamental property of 
the object, e.g., the mass of a cluster of galaxies. 


3.2 Intrinsic scatter 

The intrinsic scatter quantifies how close the data distribution is to 
strict linearity. The true properties of an astronomical object X and 
y, which we can try to measure, are intrinsically scattered with 
respect to the latent model variables Xz and Yz, which fall on a 
line without deviations but which are hidden properties. 

Observable properties are usually log-normally distributed 


about the mean scaling relations ([Stanek et al.|2010{|Fabjan et al.| 


2011 II Angulo et al.|2012| Saro et al. 2013b. This is supported by nu- 

merical simulations (Stanek et al.| 2010 | 

Fabjan et aL|2011[|Angulo 

et al.| 2012 | and observational studies 

Maughan||2007| Vikhlinin 


et al.j200^. We assume that the intrinsic scatters are Gaussian, 


P{Xi,Y,\Xz„Yz.i) =M^°{{Xz,i,Yz,i},y.,i), ( 6 ) 

where is the bivariate Gaussian distribution and Vo-,i is the 
scatter covariance matrix of the i-th cluster whose diagonal ele¬ 
ments are denoted as i and (Jy\z i’ ^^d whose off-diagonal 
elements are denoted as pxY\z,i(^x\z,i(^Y\z,i- 

The intrinsic scatter of a scaling relation is related to the de¬ 
gree of regularity of the sample. The scatter can be prominent in 
morphologically complex halos or in objects which depart from 
dynamical/hydrostatic equilibrium (|Fabjan et al.|2011||Saro et al.| 
|2013[ l. Deviations from spherically symmetry are another major 
source of scatter jLimousin et al.|2013| [Sereno et al.|2013| l. Since 
high redshift objects are more irregular and less spherical, the scat¬ 
ter is usually expected to increase with redshift[Saro et al.](|2013[l; 


[Fabjan et al.H201 f ). The degree of scatter and its evolution depends 
on the baryonic physics too ( [Fabjan et al.|201 f] !. 

The time evolution of the scatters and of their correlation can 


be modeled as ( |CoMaLit-IVt 

ox\z{z) = ax\z,oFT'''^'^’^ , (7) 

^Y\z{z) = aYiz,oF]^^'^'^‘ , ( 8 ) 

PxY\z{z) = Pxy\z,oF1'’^^''''^^ _ (9) 

where D is either the luminosity or the angular diameter distance. 

If we want to regress Y against X, we can identify X and Z. 
Equation 0 reduces to 

P(y|Z0 (10) 


3.3 Measurement uncertainties 

The measured quantities x and y are the manifest values of X and 
10 Due to observational uncertainties their relation can be ex¬ 
pressed as 

P{xi,y^\XuY,) ( 11 ) 

where is the uncertainty covariance matrix whose diagonal el¬ 
ements are denoted as <5^ ^ and <5^ j, and whose off-diagonal ele¬ 
ments are denoted as pxypSxpSyp- 

As a result of the i-th measurement process, we obtain Xi,y^ 
and the related uncertainty covariance matrix The variables 
Xz,i, Yz,i, Xi, and Yi, are unknown variables to be determined 
under the assumption of linearity. 


3.4 Malmquist bias 


Selection effects are a common concern in the astronomical anal¬ 
ysis. If only objects above an observational threshold (in the 
response variable) are included, the sample is affected by the 
Malmquist bias ( |Malmquist|| 1920| l. In this case, the relation be¬ 
tween the measured and the true values (Eq. [ID or between the 
true values and the unscattered values (Eq.[^ has to be modified. 

The bias can be modeled by truncating the probability distri¬ 
butions below the threshold The measured and the true values 
of the quantities are now related as 

P{xi,y,\Xi, Y,) (X Yi}, M5,i)U(y^.„ ), (12) 

where U is the uniform distribution null for y < y,|,;. 

The observational thresholds may not be exactly known. 
This may be the case when the quantity which the selection pro¬ 
cedure is based on differs from the quantity used in the regression. 
We have then to consider the additional relation 


^’Cy=h,i)=A^(y, 


th.obs.i’ ^yih, 


), 


(13) 


where , is the uncertainty associated to the measured threshold 


Tthobsi- Equations 1 12 and 13 1 can be combined by considering a 


sigmoid curve instead of the step function in Eq. GD- 

The conditional probability of the proxies in the sample is 
truncated too. 


P{Xi,Y,\Xz,i,Yz,^) oiN^°{{Xz,i,Yz,i},y,y,i)U{Yii,„), (14) 


where the threshold Fth.i follows the distribution 

P(Fft,i) =Af(y,„,,4)- (15) 


^ X, y, X, and Y are vectors of n elements. 
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LIRA 5 


If the thresholds are known without uncertainties, we can identify 
y,i, and Fth and the right-hand side of Eq. l |15^ is formally substi¬ 
tuted by a Dirac delta function. 


3.5 Intrinsic distribution of the covariate 

The proper modeling of the distribution of the independent variable 
is crucial ( |Kelly|2007| >. Samples considered in regression analyses 
are usually biased with respect to the parent population. Sources 
may be selected according to their properties. Furthermore, even in 
absence of selection effect the intrinsic parent population is usually 
not uniform, which may cause border effects. 

The intrinsic distribution of the independent variable Z is 
shaped by two main effects. On one hand, very massive objects are 
rarer. On the other hand, massive objects are usually strong emitters 
and are easier to be detected to very large distances. As a result, the 
shape of the distribution is fairly unimodal and it evolves with time 
( |CoMaLit-IVl l. 

The combined evolution of the completeness and of the parent 
population can be characterized through the evolution of the peak 
and of the dispersion of the distribution of the selected sample. The 
intrinsic distribution of Z can be approximated with a mixture of 
rimix time-evolving Gaussian functions (|Kelly||2007| |CoMaLit-II| 
|CoMaLit-IV) , 

^mix 

P{Z) = '^TVkJV {lJ-z,k{z),a%^k{z)) , (16) 

fe=i 

where is the probability of drawing a data point from the fc-th 
component, ~ 1- 

I assume that the mixture components have different mean and 
dispersion but share the same evolution parameters. The mean of 
each component is connected to the (redshift-evolving) observa¬ 
tional thresholds and to the intrinsic scatter of the observable quan¬ 
tity used to select the clusters, which evolves too. As a result, the 
evolution of the (mean of the) fc-th mixture component can be mod¬ 
eled as ( |CoMaLit-IV| l, 

pz,k{z) = ^lzfik +liJ.z,PzT -I- 7 ^, 2 .D logD, (17) 

where ^z,ok is the mean at the reference redshift. 

The evolution of the dispersions is related to the intrinsic scat¬ 
ter of the observable property used to select the sample. The time 
dependence can be modeled as ( |CoMaLit-IV^ 

azMA = (18) 

The proper modeling of the intrinsic distribution of the inde¬ 
pendent variable is crucial to correct for the Eddington bias, when 
the average value of an observed sample differs from the true in¬ 
trinsic average of the objects of the same class ^Eddington|1913[ 
|Jeffreys| 1938l|Eddington| 1940|CoMaLlFTl l. 

The intrinsic distribution can be alternatively modeled as a 
truncated Gaussian distribution 

p{Z) = JV {tlz,o{z),(T%fi{z)) W(Zmin, ^max). (19) 


3.6 Departure from linearity 

Physical processes are effective at different scales, which may 
cause deviation from linearity. Detection of changes in the char¬ 
acteristics of random processes is related to the sphere of statistical 
analysis called the theory of change-point detection (|Brodsky &| 
|Darkhovsky|1993[|Killick & Eckley|2014^ . 

Gravity is the driving force behind formation and evolution 


of galaxy clusters but at small scales baryonic physics can play a 
prominent role. As a result, linearity can break. This can be shaped 
with a knee in the relation, such that before the breaking scale ^knee, 
the scaling follows 


Yz = CtYlZMce + Py\ZMccZ + Tvl^.kneeT" + SY\Z,imeeZ T. (20) 

The normalization aY\z,kn<x the time evolution 7y|z,knee are 
determined by requiring equality at the transition Z^me, 


Cry|Z,knee — + {Py\Z “ PvlZ.knee) (21) 

7y|Z,knee = 7l'|Z- (22) 

The transition between the two regimes can be modeled 
through a transition function. 


fknt 


1 

1 “t“ exp \(^Z .^knee)/(laiee] 


(23) 


where the scale /knee sets the transition length. The relation over the 
full range reads 


Yz — CkY\Z + Py\zZ -f 'yY\zT -f 5y\zZ T (.^knee “ Z) fkiiee{Z) 
X [{Py\Z — Py\z ,knee) + {5y\Z ~ <5y|2_knee) T] , (24) 

The same physical processes can affect the scatter too, which I 
model as 

2ref) = 'PYlZfi + ((Zy|z,0,knee “ 0 'y| 2 ,o)/knee(.Z'). (25) 

I assume that the redshift evolution of the scatter is not affected. 


3.7 Priors 

The Bayesian statistical treatment requires the explicit declaration 
of the priors. Priors can be either conveniently non-informative, if 
we have no guess on the parameters ( |CoMaLit2I{ |CoMaLit-II[ ), or 
peaked and with small dispersion, to convey the information ob¬ 
tained with concluded experiments or theory. The parameters can 
be also frozen by fixing them with a Dirac delta-prior. Priors in 
LIRA are highly customizable. In the following, I list the default 
choices. 

Standard priors on the intercept aY\z and on the means of the 
mixture components fJ,z,o,k can be flat, 

CkY\z, Fz.o.fc ~ W(—riL, nr), (26) 

where ul is a large numbeQ 

The slopes can follow the Student’s ti distribution with one 
freedom, as suitable for uniformly distributed direction 

Py\z, 7y|z, 7 mz,d (27) 

The 7 -type parameters are set to zero when no redshift information 
is provided. The other slope parameters (/3x|z. the 5’s, the other 
7 ’s) are by default frozen to 0. They can be unpegged by setting 
other priors. In these cases, the non-informative ti prior is sug¬ 
gested. 


degree c 
angle^ 


^ In LIRA, riL is customizable and the shortcut for the prior is dunif. 
The default value is riL = 10"*. 

® In LIRA, the shortcut for this prior is dt. 
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6 M. Sereno 


For the variance, I adopted by default a scaled inverse 
distributior[^ 

<^Y\z,o, o-|,o.fe ~ Scale-inv-x^(i/,^), (28) 

with = 2/nL degrees of freedom and scale 5 = 1. This parameter 
choice makes the distribution in Eq. j28| > nearly scale-invariant. 

By default, X tallies with Z and it is unscattered, f7x\z,o = 0- 
The scatter correlation pxY\z,o is set to zero too. Otherwise, a flat 
prior can be adopted. 


PXY\Z,0 ~ W(—1, 1). 


(29) 


The parameters in the scaling Y-Z and X-Z, see Eqs. (j^and 
are redundant. If we do not know the value of Z, we cannot mea¬ 
sure all of them. By default, LIRA assumes that X is an unbiased 
proxy of Z, i.e., ax\z = 0, Px\z = Y lx\z = 0, and 5x\z = 0. 
Eor linear relations, fixing the parameters of the X-Z rather than 
the Y-Z relation is just a matter of rescaling. In absence of a direct 
measurement of Z, the bias between X and Z (i.e., ax\z 7^ 0) is 
degenerate with the estimated overall normalization of the scaling 
between Y and Z. The regression can only constrain the relative 
bias between X and Y jCoMaLihU . 

By default, LIRA uses a single Gaussian distribution to model 
the intrinsic distribution of the independent variable (rimix = 1). For 
mixtures, LIRA adopts a Dirichlet distribution for the probability 


coefficient^ (Kelly 


20071 


71 - 1 , 


' Dirichlet(l,..., 1), 


(30) 


which is equivalent to a uniform prior under the constraint 
TTfe = 1. The number of mixture components rtmix has to be 
fixed. Alternative approaches can determine the optimal number of 
Gaussian components modeling the intrinsic distribution through a 
Dirichlet process jMantz|2015| l. 

By default, the regression adopts linear models with no breaks. 
There is no knee and the slope PY\ZM<^e *-*1^ Sy\z, knee "'Th 
Py\z and ^Yjz, respectively. Otherwise, a flat prior can be adopted 
for Zknee and a Student’s-t prior for pYjZMee and 5y|2,knee> when 
applicable. The transition length is set by default to (knee = 10“^, 
which makes the transition abrupt. 

The above considerations drove the choice of the default priors 
listed in Table[T] For current data-sets, the jn parameters can be set 
safely to zero in any regression, see App.|^ The only exception is 
which is crucial to model the time-evolution of the intrinsic 
function of flux-selected sample jCoMaLit-IV^. 


and aYjzp ~ 0.1. X tallies with Z (ax|z = 0, Pxjz = 1 and 
^xjzp ~ 0). All other parameters were set to zero. 

The measurement errors were different for each data point. 
The variances in the measurement errors, Sx^ and Sy^ were drawn 
from a scaled inverse x^-distribution with 5 degrees of freedom 
( |Kelly|2007[ >. The scale parameters, which dictate the typical size 
of the measurement errors were set to 0.1^. I simulated a varying 
degree of correlations among the measurement errors. The correla¬ 
tions were drawn from a uniform distribution ranging from 0.0 to 
0.4. 

In case of a sample covering a redshift range, the above pa¬ 
rameters values were intended as the normalizations at the refer¬ 
ence redshift, Zmf = 0.01. The default time evolution was set to 
7y|z = 1 and the redshifts were drawn from a lognormal dis¬ 
tribution. I considered F7 = Ez as time factor and I computed 
cosmological distances as angular diameter distances. 

For each case study I generated 10® data sets, each one 
with risampie = 100 data points, as typical of current samples 
( |CoMaLit-IV^ . The scaling relations, the scatters, and the intrinsic 
^-distributions were recovered with LIRA. Parameter priors were 
set to the default distributions listed in Table Posterior proba¬ 
bility distributions were constrained with Markov chains generated 
with a Gibbs sampler. The LIRA package relies on the JAGS (Just 
Another Gibbs sampler) library to perform the sampling. 

For each data set, I computed the parameter medians from the 
chains and I studied the distributions of the medians of the ensem¬ 
ble. 


The simulation scheme was modified and made more complex 
if needed to highlight some aspects. On occasion, I simulated a 
skewed and evolving intrinsic distribution of the independent vari¬ 
able, scattered values of X, and time evolving scatter or slope. 

When applicable, I al so considered other publicl y available 

1996|( and LIN- 


wnen applicanie, i al so considered other p i 
methods such as BCES|^ jAkritas & Bersha^ f 


B or its gener alization to multivariate regression MLIN 

J I ’It n y-1 *-l X n Vxx o **vx , 


I Kelly|2007|l. The underlying hypotheses of these methods 


MIX 
MIX 

are well known and I only used them when applicable. I could not 
consider BCES for time-dependent populations or MLINMIX for 
time evolving scatters, Malmquist biased samples or in case of de¬ 
viation from linearity. 


4 SIMULATIONS 

I investigated how the approach detailed in Sec.j^can recover scal¬ 
ing relations in presence of noise, scatter, and selection biases. The 
approach was tested with toy models and simulated data sample. I 
set up a basic scheme, whose essential features were as follows. 

The independent variables Z were drawn from a normal dis¬ 
tribution with mean pz,o = 0, and standard deviation azp = 0.3. 
The values of Y were simulated assuming ay|z = 0, Py\z = 1 

® The default inverse x^ prior for the variance is equivalent to a Gamma 
distribution r(r = 1/nL, A = l/nk) for the precision, i.e., the inverse 
of the variance. Hence, the name prec. dgamma for this prior in LIRA, 
where prec . dgamma is applicable only to variances. Other customizable 
priors model directly the standard deviation. 

^ In LIRA, the shortcut for this prior is ddirch. 


4.1 Skewed distribution 

The accurate modeling of the intrinsic distribution of the covari¬ 
ate variable is crucial to unbiased linear regression. To test its ef¬ 
fect, I considered an asymmetric distribution. I modified the basic 
simulation scheme by drawing Z from a skew-normal distribution 
with shape parameter az.o.skew = 3.0. The location and the scale 
parameter were made to coincide with the mean and the standard 
deviation of the basic normal distribution. Results are reported in 
Tabled 
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Table 2. Scaling parameters recovered from samples whose independent 
variable follows a skewed distribution. I report the bi-weight estimators of 
the distribution of the median values of the simulated chains. 


parameter 

input 

LIRA 

t^mix ~ 1 

LIRA 

^mix ~ 3 

MLINMIX 

^mix ~ 3 

BCES 




2 — ^ref 



(^Y\Z 

[0] 

0.00 ±0.03 

0.00 ±0.03 

0.00 ±0.03 

0.00 ±0.04 

0Y\Z 

[1] 

1.02±0.12 

1.01 ±0.11 

1.00±0.11 

L02±0.15 

^Y\Z,0 

[0.1] 

0.09 ±0.03 

0.09 ± 0.03 

0.10 ±0.02 

0.13±0.01 



redshift evolution 







TTmiv = 1 


(^Y\Z 

[0] 

-0.01 ±0.10 


0.01 ±0.09 


0Y\Z 

[1] 

1.02±0.12 


0.99 ±0.11 


lY\Z 

[1] 

0.83 ±0.45 


1.02 ±0.47 


^Y\Z,Q 

10.1] 

0.09 ±0.03 


0.10 ±0.02 



4.1.1 No time evolution 

I first considered samples drawn at the same reference redshift. 
Results are summarized in Table and Fig. [T] To recover the 
parameters, I considered either a simple LIRA model with just 
one normal distribution to shape p{Z) or a mixture of three com¬ 
ponents. For comparison, I also computed parameter chains with 
LINMIX adopting a mixture of three Gaussian distributions and 
the BCES(YIX) estimator. The original work introducing BCES 
did not advocate any method to compute the intrinsic scatter, which 
I computed following [Pratt et aL| ( |2009^ . 

Input parameters are well reproduced by all methods. In this 
setting, the agreement between LIRA and LINMIX is excellent. 
This is expected since the main assumptions of the two methods 
are equivalent. Minor differences come from the different choice 
of the priors, which are of lesser importance when the data analy¬ 
sis is dominated by the likelihood and by the data. The parameter 
distributions agree very well, even though the distribution of the 
intrinsic scatter (JY\zfi from LIRA has a more pronounced tail at 
small values. This tail is not present in richer data-sets, see Sec. |4.5| 
BCES recovers well the central values of the slope and of the 
intercept but statistical uncertainties are larger. The intrinsic scatter 
estimate is biased high. 

Even though the intrinsic simulated distribution of Z is 
skewed, there is no real improvement by augmenting the number 
of mixture components, see Table As far as the intrinsic distri¬ 
bution is unimodal and the sample is not too rich, one Gaussian 
component is enough to recover the regression parameters jKelly| 
|2007||CoMaLlfTll ). 

4.1.2 Evolution with redshift 

I then considered samples covering an extended redshift range. 
Redshifts were drawn from a lognormal distribution such that In z 
has mean ln(0.3) and standard deviation 0.5. In these simulations, 
the skewed distribution of the independent variable is time evolv¬ 
ing. The location parameter of the input distribution evolves with 
the redshift as in Eq. with = 0.5 and 'yfiz,D = 0.5; 

the scale parameter is fixed, whereas the shape parameter evolves 

® http://WWW.astro.wise.edu/~mab/archive/stats/ 
stats.html 

^ http://idlastro.gsfc.nasa.gov/ftp/pro/math/ 
linmix_err.pro 

http://idlastro.gsfc.nasa.gov/ftp/pro/math/ 
mlinmix_err.pro 



U b. ... 1 ... . 
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Figure 1. Distributions of the median parameters of the scaling relation of 
samples drawn from a skewed intrinsic distribution p{Z). The blue, green, 
orange and red lines are the smoothed histograms of the distributions ob¬ 
tained with LIRA by modeling p(Z) with a single Gaussian function, with 
LIRA by adopting a mixture of 3 Gaussian functions, with LINMIX by 
adopting a mixture of 3 Gaussian functions, and with BCES, respectively. 
The vertical gray lines are set at the input parameters. From the top to the 
bottom panel: the intercept, the slope, and the intrinsic scatter. 


with redshift proportionally to E^. The input intrinsic scatter aY\z 
is redshift independent. 

I recovered the input parameters by modeling p{Z) with a sin¬ 
gle normal distribution whose mean and standard deviation evolve 
with time. The prior on yaz.F^ was set such that the inverse vari¬ 
ance follows a Gamma distribution, see Eq. l |28| l, whereas 7 ^ 2,0 
was set to zero. Even if the input p{Z) distribution is asymmetric, 
this modeling is enough to recover the time evolution of p{Z), see 
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Figure 2. The reconstructed intrinsic distribution of the independent vari¬ 
able Z at different redshifts. From top to the bottom, 2 = 0.1, 0.3, 0.5, 
1.0. The black line is the input distribution, the blue line is the median re- 
constnacted relation, the shadowed blue region encloses the l-cr confidence 
region at a given Z. For a total of nsampie = 100 data, we expect ~ 21, 
51, 20 and 1 sources in the redshift range 0.0 ^ 2 ^ 0.2, 0.2 ^ 2 ^ 0.4, 
0.4 ^ 2 ^ 0.6 and 0.9 ^ 2 ^ 1.1, respectively. 


Figure 3. Distributions of the median parameters obtained from samples 
with a skewed and time-evolving intrinsic distribution p{Z, 2 ). The blue 
(green) lines are the smoothed histograms of the distributions obtained with 
LIRA (MLINMIX). p{Z^ 2 ) was modeled with a single Gaussian function. 
The vertical gray lines are set at the input parameters. From the top to the 
bottom panel: the intercept, the slope, the time evolution, and the intrinsic 
scatter. 
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Table 3. Scaling parameters recovered from biased samples. For each case 
(listed in Col. 1), I reported on consecutive rows the values of the parame¬ 
ters obtained with regressions which do either connect or not correct for the 
bias. Reported values are the bi-weight estimators of the distribution of the 
median values of the simulated chains. 


case 


^Y\Z 

^viz,o 

input 

[0] 

[1] 

[0.1] 

Eddington bias (ctxiz A 0) 
corrected 0.00 ± 0.02 

1.01 ±0.09 

0.09 ±0.03 

biased 

0.00 ±0.02 

0.90 ±0.07 

0.14 ±0.02 

Malmquist bias 
corrected 0.01 ±0.03 

L06±0.11 

0.09 ± 0.03 

biased 

0.03 ± 0.02 

0.90 ±0.09 

0.08 ± 0.03 

Linearity break (knee) 
corrected 0.00 ± 0.03 

0.98 ±0.18 

0.10 ±0.04 

biased 

-0.05 ± 0.02 

1.31 ±0.12 

0.15 ±0.03 


Figure!^ and to get unbiased values of the scaling parameters, see 
Tableland Figurej^ 

For comparison I performed the regression with MLIN- 
MIX too. The LIRA scheme differs from the multivariate anal¬ 
ysis detailed in |Kelly| P007[ > in one major feature. LIRA mod¬ 
els the intrinsic distribution with a mixture of one-dimensional 
Gaussian components whose means and standard deviations are 
time-dependent. On the other hand, MLINMIX models the bi- 
dimensional distribution of Z and T with a mixture of bi- 
dimensional Gaussian components whose means and variances are 
not time-evolving. Notwithstanding this important difference and 
some minor differences due to the prior choice, both approaches 
can recover with good accuracy the scaling parameters, see Fig- 
ure[3l 

As far as the scaling parameters and the scatter is concerned, it 
is important to model the non-uniformity of the distribution of the 
intrinsic distribution. Details on the exact form of the distribution 
are of second order. The safer approach to model noisy ad sparse 
samples is to use the simplest model, e.g., a single normal distribu¬ 
tion for p{Z). In samples of order of one hundred of objects, there 
are just a few items at high redshift. Enforcing a more complex 
distribution, such as a skewed Gaussian, to model sparse data can 
bias the results towards a few outliers due to overfitting. Complex 
distributions are recommended only for very rich samples. 

4.2 Eddington bias 

The Eddington bias can affect the estimate of the scaling param¬ 
eters if the measurement errors on the covariate variable are not 
accounted for ( |Eddington|| 1 940^ or the X variable is a scattered 
proxy of the latent covariate Z |Sereno & Ettori|2015b^ . Here, we 
are mostly interested in the second case, which is often overlooked. 

I simulated the samples by assuming that the measurable X is 
an unbiased (axjz = 0 and /Sxjz ~ 1) but scattered ((Tx|z,o = 
0.1) proxy of Z. The data were fitted with either a corrected pro¬ 
cedure, where the intrinsic scatter ctxiz.o ts a model parameter, or 
a biased procedure with (Jx|z,o = 0. Results are summarized in 
Table|^and Fig.|g 

The corrected procedure recovers the input parameters very 
well whereas systematic errors are significant if we do nor correct 



-0.05 0.00 0.05 

ay|z 



^y|z 



Cty I Z,0 


Figure 4. Distributions of the median paiameters obtained from samples 
affected by Eddington bias. The blue lines are the smoothed histograms of 
the distributions obtained by considering the intrinsic scatter in the covariate 
variable. The green lines plot the results from a biased fit. The vertical gray 
lines are set at the input parameters. From the top to the bottom panel: the 
intercept, the slope, and the intrinsic scatter. 


for the Eddington bias. The Eddington bias makes the observed 
relation flatter and inflates the intrinsic scatter. Since I considered a 
scatter axjz independent of Z, the bias has a symmetric action and 
the pivot point of the relation does not change. The normalization is 
not affected. Statistical uncertainties on the regression parameters 
are underestimated, as usual in biased measurements. 
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Figure 5. Distributions of the median parameters obtained from samples se¬ 
lected in the response variable. The blue lines are the smoothed histograms 
of the distributions obtained by correcting for the Malmquist bias. The green 
lines plot the results from a biased fit. The vertical gray lines are set at the 
input parameters. From the top to the bottom panel: the intercept, the slope, 
and the intrinsic scatter. 


4.3 Malmquist bias 

The Malmquist bias has long been known jMalmquistj 1920^ . Still, 
it can be difficult to tackle. Proposed recipes consider the correction 
of the measured values of individual objects, which needs a guess 
on the intrinsic scatter, or the modeling through a proper definition 
of the selection efficiency in the likelihood function jVikhlinin et al.| 
| 200 ^ . 

To test the effect of the bias, I simulated the samples as in 
the standard case but 1 only kept objects whose measured response 


Table 4. Scaling parameters recovered from time evolving samples. I re¬ 
port the bi-weight estimators of the distribution of the median values of the 
simulated chains. 


parameter 

input 

unbiased 

biased 


Time evolving scatter 


“-riz 

[0] 

0.00 ± 0.02 

0.00 ± 0.02 

I^Y\Z 

[1] 

1.00 ±0.03 

1.00 ±0.03 

' 1 Y\Z 

[1] 

1.00 ±0.07 

1.00 ±0.08 

^Y\Zfi 

[0.1] 

0.100 ±0.008 

0.123 ±0.007 

'^^Y\Z,Fz 

[0.5] 

0.048 ±0.15 

[0] 

Time evolving slope 


<^Y\Z 

[0] 

0.00 ±0.05 

-0.05 ± 0.04 

/ 3 y|z 

[1] 

1.00 ±0.07 

1.07 ±0.05 

7y|z 

[1] 

0.93 ±0.59 

1.72 ±0.34 

W\z 

[1] 

0.93 ± 0.67 

[0] 


exceeded a threshold value (y > i/th = —0.3). Nearly 80 per cent 
of the items makes the cut. Results are summarized in Table and 
Fig-|5] 

The Malmquist bias makes the observed relation flatter. If the 
bias is not corrected for, the measured slope is biased toward zero 
whereas the measured intercept is biased high. The scatter is af¬ 
fected too. It can be underestimated. 


4.4 Linearity break 


Different physical processes can break the linearity of a scaling 
relation. In the formation and evolution of galaxy clusters, baryonic 
and energetic effects are relevant in small objects and can challenge 
the dominance of the gravitational force. A bent scaling relation can 
be more apt to model the process. 

I simulated a broken power law relation. I set the knee at 
•^knec = Mz,o — crz,o, i.e., ~ 16 per Cent of the sources at Z < Zimce 
follow a different scaling. The slope before the break was set at 
/3y|z,knee = 3.0. Results are summarized in Tablej^and Fig.|6 

Parameter estimates obtained with a simple linear model are 
severely biased. The model without knee cannot distinguish the two 
regimes and the measured slope is a weighted average of the two 
real slopes. Being the slope before the knee steeper in the simula¬ 
tion, the intercept estimated by the linear model is biased low. If not 
modeled, the knee strongly affects the estimated scatter. To mimic 
the break and the steeper slope, the estimated scatter is severely 
overestimated. 


4.5 Time dependent intrinsic scatter 

Usual data sets are not rich enough to measure the time evolution of 
the intrinsic scatter ( |CoMaLit-IV| l. The 7 parameters modeling the 
scatter redshift dependence, i.e., Jo-yiz f T°'v|z n’ better 
seen as noise parameters to marginalize over. 

The study of the time evolution of the scatter will be at reach 
of future surveys jLaureijs et al.|20rT| l. I then increased the num¬ 
ber of simulated sources per sample and their redshift range and I 
considered smaller observational errors. I simulated samples with 
400 items each. The scale parameters of the scaled inverse x^- 
distributions modeling the uncertainty variances 5x^ and Sy^ were 
set to 0.05^ and measurement errors were assumed to be uncorre¬ 
lated. 

Redshifts were drawn from a lognormal distribution such that 


© 0000 RAS, MNRAS 000, 000-000 

























LIRA 11 



«y|z 



/3y|z 



Oy I z,o 


Figure 6. Distributions of the median parameters of a broken power-law. 
The blue lines ai‘e the smoothed histograms of the distributions obtained by 
fitting the simulated data with a scattered broken power law. The green lines 
plot the results from a biased linear fit. The vertical gray lines are set at the 
input parameters. From the top to the bottom panel: the intercept, the slope, 
and the intrinsic scatter. 


In 2 : has mean ln(0.5) and standard deviation 0.8, i.e., ~ 19 per 
cent of the sources are at 2 > 1. The independent variables were 
drawn from a time evolving normal distribution. The mean evolves 
with redshift as in Eq. 03 with = 0.5 and = 0.5; 

the standard deviation is constant. 

The intrinsic scatter evolved with redshift as in Eq. l|^, with 
<^Y\z,0 = 0.1, = 0-5 and 'y^Yiz.n = 0 - The input intrin¬ 

sic scatter at a ~ 1 is ~ 30 per cent larger than the local value. The 
remaining parameters were set as for the other simulations. 




Yy\z 



0.0 0.2 0.4 0.6 0.8 


Figure 7. Distributions of the median parameters obtained from samples 
with time-evolving intrinsic scatter. The blue line are the smoothed his¬ 
tograms of the dish'ibutions obtained by fitting the simulated data with a 
time-dependent scatter. The green lines plot the results from a biased linear 
fit with 7o-y|2,Fj = 0. The vertical gray lines are set at the input param¬ 
eters. From the top to the bottom panel: the intercept, the slope, the time 
evolution, the intrinsic scatter, and the scatter evolution. 
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Results are summarized in Table and Fig. |7] Even if we 
neglect the scatter evolution, the estimates of the scaling parame¬ 
ters are unbiased whereas the estimated intrinsic scatter is weighted 
over the redshift range. The corrected regression can recover both 
the normalization and the time evolution of the scatter. Since the 
simulated sample is copious and observational accuracy is im¬ 
proved with respect to the other simulations, the posterior distri¬ 
bution of the intrinsic scatter is symmetric, with no prominent tail 
at small values. 


4.6 Redshift dependent slope 

The emergence of some processes at high or low redshift might 
induce a tilting slope. I simulated a scaling relation with Sy\z = 1- 
In this case, the slope changes by AI3y\z ~ 0.25 from redshift 0 
to 1. 

Redshifts were drawn from a lognormal distribution such that 
In 2 has mean ln(0.3) and standard deviation 0.5. The independent 
variables were drawn from a time evolving normal distribution with 
'ytJ-z,Fz ~ 0-5 and 'yfj.z,D = 0.5; the standard deviation is constant. 
The scale parameters of the scaled inverse -distributions model¬ 
ing the uncertainty variances 5x^ and 5y^ were set to 0.05^ and 
measurement errors were assumed to be uncorrelated. The remain¬ 
ing parameters were set as in the basic scheme. 

Results are summarized in Tablej^and Fig.[^ A correct mod¬ 
eling of the tilt is crucial to get unbiased parameters. Only the esti¬ 
mate of the intrinsic scatter is not affected. 


5 CONCLUSIONS 


Bayesian linear regression models can involve a large number of 
parameters. The analysis of the hierarchical models can be per¬ 
formed with Markov Chain Monte Carlo (MCMC) simulations. 
Since all relations in the model are expressed as conditional proba¬ 
bilities, the posterior can be efficiently explored with a Gibbs sam¬ 


pler ( |Kelly|2007[[Mantz|2015^ 

LIRA joins a number of already available routines for linear 
regression. Just to name a few of them which were proposed to 
astronomers first, the Fortran function BCES, the IDL (Interactive 
Data Language) routine LINMIX and its multivariate extension 
MLIN MIX, the Python package ASTROML ^’^[VanderPlas et ~ 
2012| l, and the R-packages LRG^ and HYPER-Fn[^ 


All of these procedures have their own specifics and strengths 
that can make them preferable under given circumstances. LIRA 
is optimized for astronomical studies. It allows the consistent treat¬ 
ment of time-evolution, intrinsic scatter, and selection effects. Red¬ 
shift has a prominent role in the proposed method. The time de¬ 
pendence of slopes, normalizations, intrinsic scatters, and correla¬ 
tions can be determined. Eurther complexity is implemented. The 
Malmquist and the Eddington biases can be addressed. Deviations 
from linearity and bent relations with knees can be accounted for. 

The degree to which selection and methodological biases can 
affect the study of current and future samples was determined with 
a series of simulations. Selection effects are an important concern. 
But they are known problems and to some extent they are known 
unknowns. We usually know whether they are affecting our sample. 


http://www.astroml.org/ 

https://cran.r-project.org/web/packages/Irgs/ 
index.html 

https : / / github. com/asgr/hyper . fit 
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Figure 8. Distributions of the median paiameters obtained from samples 
with time-evolving slope. The blue lines are the smoothed histograms of the 
distributions obtained by fitting the simulated data with a time-dependent 
slope. The green lines plot the results from a biased linear fit with Sy\z = 
0. The vertical gray lines are set at the input parameters. From the top to the 
bottom panel: the intercept, the slope, the time evolution, the slope tilt, and 
the intrinsic scatter. 
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Methodological biases can be unknown unknowns. Different pa- 
rameterizations can give excellent fits to the data with significantly 
different results. We do not know a priori the right parameteriza¬ 
tion. The problem is exacerbated by the high degree of degener¬ 
acy among involved parameters. The feature of a linear regression 
model to stay simple and to add complexity if needed is then im¬ 
portant. 
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APPENDIX A: THE LIRA R-PACKAGE 

The package LIRA is publicly available from CRAlsP^ or 
GitHuIp^ It can be installed from within R with the following com¬ 
mand 

> install.packages ( "lira", dependencies=TRUE) 

LIRA relies on the JAGS (Just Another Gibbs sampler) librar}J^ 
which must be installed separately, to perform the Gibbs sampling. 
C-F-F compilers are also needed. 

The package is loaded into the R-session with 

> library(lira) 

The linear regression analysis is performed through the 
function lira (hence the name of the package), whose output 
are Markov chains produced with a Gibbs sampler. Let x, y, 
delta. X, delta, y, covariance . xy, and z be the vectors 
storing the values of x,y, Sx, 5y, &xy andz, respectively. 

• The basic regression in Sec. |4.1.l1 was performed with 

> mcmc <- lira(x, y, delta.x=delta.x, 

delta.y=delta.y, covariance.xy=covariance.xy), 

• The Gaussian mixture in Sec. |4.1.f] was implemented as 

> mcmc <- lira(x, y, delta.x=delta.x, 

delta.y=delta.y, covariance.xy=covariance.xy, 
n.mixture=3) 

• The chains analyzed in |4.1.2| to address a time-dependent 
skewed distribution p{Z, z) were obtained with 

> mcmc <- lira(x, y, delta.x=delta.x, 

delta.y=delta.y, covariance.xy=covariance.xy, 
z = z, gamma.sigma.Z.Fz = "dt"), 

• The case of the Eddington bias in Sec. |4.2| was studied with 

> mcmc <- lira(x, y, delta.x=delta.x, 

delta.y=delta.y, covariance.xy=covariance.xy, 
sigma.XIZ.0="prec.dgamma") 

• The regression corrected for Malmquist bias, as in Sec. |4.3| is 
performed with 

> mcmc <- lira(x, y, delta.x=delta.x, 

delta.y=delta.y, covariance.xy=covariance.xy, 
y.threshold = rep(-0.3, n.data)) 
where n . data is the length of the vectors. 

• The broken power-law in Sec. |4.4| was analyzed with 

> mcmc <- lira(x, y, delta.x=delta.x, 

delta.y=delta.y, covariance.xy=covariance.xy, 

Z.knee="dunif(-3.0,3.0)", beta.YIZ.knee 
="dt") 

• The scaling with time dependent intrinsic scatter in Sec. |4.5| 
was analyzed with 

> mcmc <- lira(x, y, delta.x=delta.x, 

delta.y=delta.Y, z=z, gamma.sigma.XIZ.Fz="dt") 

• The scaling with time evolving slope in Sec. |4.6| was analyzed 
with 

https ; / /cran .r-project.org/web/packages/lira/ 
index.html 

https ; / / github. com/msereno/lira 

JAGS by M. Plummer is publicly available at http: / / mcmc- jags, 
sourceforge.net 
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> mcmc <- lira (x, y, delta.x=delta.x, 
delta.y=delta.y, z=z, delta.YIZ="dt") 

The LIRA package and further material and examples can 
also be found at http;//pico .bo. astro . it/-sereno/ 
LIRA/ 


APPENDIX B: REDSHIFT EVOLUTION 

In LIRA, the time-evolution of the parameters is factorized in two 
terms, one depending on Fz and one on the distance D. The fac¬ 
tor Fz can be either Ez oi {1 + z). Since the redshift evolution is 
poorly constrained in present data-sets, and since both the cosmo¬ 
logical distances and Fz are increasing function of the redshifts, 
the estimates of the evolution parameters and 70 of each scat¬ 
ter/dispersion parameter are highly degenerate. It is usually enough 
to model just one dependence. The exception is the time evolution 
of the means in the mixture modeling p{Z). 

For limited redshift baselines, the function Ez can be approx¬ 
imated with a power law of (1 -f 2 ). The value of the exponent used 
in the approximation depends on the redshift range considered and 
on the cosmological parameters. In most cases, modeling Fz as ei¬ 
ther i?z or (1 -F 2 ) has a minor effect on the estimates of the scaling 
parameters a and /3 and of the intrinsic scatters. 

For similar reasons, the choice of the cosmological distance 
is usually secondary. The angular diameter and the luminosity dis¬ 
tance differ for a factor (1 -F 2 )^ which can be approximately en- 
globed in E2^’" for limited redshift baselines. Luminosity distances 
can be preferred for flux limited samples. 
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